Noise tolerant algorithms for learning and searching
نویسنده
چکیده
We consider the problem of developing robust algorithms which cope with noisy data. In the Probably Approximately Correct model of machine learning, we develop a general technique which allows nearly all PAC learning algorithms to be converted into highly e cient PAC learning algorithms which tolerate noise. In the eld of combinatorial algorithms, we develop techniques for constructing search algorithms which tolerate linearly bounded errors and probabilistic errors. In the eld of machine learning, we derive general bounds on the complexity of learning in the recently introduced Statistical Query model and in the PAC model with noise. We do so by considering the problem of improving the accuracy of learning algorithms. In particular, we study the problem of \boosting" the accuracy of \weak" learning algorithms which fall within the Statistical Query model, and we show that it is possible to improve the accuracy of such learning algorithms to any arbitrary accuracy. We derive a number of interesting consequences from this result, and in particular, we show that nearly all PAC learning algorithms can be converted into highly e cient PAC learning algorithms which tolerate classi cation noise and malicious errors. We also investigate the longstanding problem of searching in the presence of errors. We consider the problem of determining an unknown quantity x by asking \yes-no" questions, where some of the answers may be erroneous. We focus on two di erent models of error: the linearly bounded model, where for some known constant r < 1=2, each initial sequence of i answers is guaranteed to have no more than ri errors, and the probabilistic model, where errors occur randomly and independently with probability p < 1=2. We develop highly e cient algorithms for searching in the presence of linearly bounded errors, and we further show that searching in the presence of probabilistic errors can be e ciently reduced to searching in the presence of linearly bounded errors. Thesis Supervisor: Ronald L. Rivest Title: Professor of Computer Science
منابع مشابه
Eecient Noise-tolerant Learning from Statistical Queries
In this paper, we study the extension of Valiant's learning model [32] in which the positive or negative classi cation label provided with each random example may be corrupted by random noise. This extension was rst examined in the learning theory literature by Angluin and Laird [1], who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملStatistical Query Learning (1993; Kearns)
The problem deals with learning {−1, +1}-valued functions from random labeled examples in the presence of random noise in the labels. In the random classification noise model of of Angluin and Laird [1] the label of each example given to the learning algorithm is flipped randomly and independently with some fixed probability η called the noise rate. The model is the extension of Valiant’s PAC m...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملStatistical Active Learning Algorithms
We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995